-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] Update build environments for latest version of weather model (hash 2f1c8e1) #140
Merged
mkavulich
merged 14 commits into
ufs-community:develop
from
mkavulich:update-weather-model
Jun 2, 2021
Merged
[develop] Update build environments for latest version of weather model (hash 2f1c8e1) #140
mkavulich
merged 14 commits into
ufs-community:develop
from
mkavulich:update-weather-model
Jun 2, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…l, update 32bit build to actually work
…n, including esmf/8_1_0_beta_snapshot_47
mkavulich
added a commit
to ufs-community/regional_workflow
that referenced
this pull request
May 25, 2021
…A_3km pre-defined domain, update timestep and MPI settings (#492) ## DESCRIPTION OF CHANGES: This PR accomplishes three things: 1. A new pre-defined domain (RRFS_NA_3km) has been added to the SRW App. Nodes/core settings must be modified for chgres_cube and post due to the size of this domain. A WE2E test was added and more information on all of these settings can be found within the related config.sh script (tests/baseline_configs/config.grid_RRFS_NA_3km.sh). 2. The default k_split value is updated for a faster model integration. With k_split=2, we see model integration ~30% faster than the previous settings for the same weather model hash. **This will not affect physics suites that have specified other k_split values** 3. In order to properly run the above domain with the intended FV3_RRFS_v1alpha physics suite, the weather model needed to be updated to a more recent hash. This more up-to-date weather model version also has renamed the FV3_GFS_v16beta suite to FV3_GFS_v16; this required a number of changes to the workflow and end-to-end tests. In addition, several changes to default settings are occurring in this PR. Changes have also been made to k/n_split values in the namelist template which optimize run time. CPUS_PER_TASK_RUN_FCST is changed from "4" to "2" in this PR. Setting this field to "4" was doubling the requested nodes for the run_fcst task. For example, a 3-km CONUS run that normally requests 25 nodes (based on predefined layout_x/y values) was asking for 50, simply because CPUS_PER_TASK_RUN_FCST=4, which was unacceptable. When it is set to "2", the number of nodes remains unchanged, in line with the layout_x/y values. EMC is using CPUS_PER_TASK_RUN_FCST=2 for their runs, so this should be uncontroversial. This PR will need to be accompanied by changes in the ufs-srweather-app for updating the weather model hash and incorporating some necessary build changes (including compiling with 32-bit reals by default); this PR has been created (ufs-community/ufs-srweather-app#140) but it still a draft pending some platform-specific fixes and the merger of this PR. ## TESTS CONDUCTED: For the initial changes for PR #480, multiple tests on Hera were run, including a full 36-hr forecast here: /scratch2/BMC/det/beck/FV3-LAM/expt_dirs/test_RRFS_NA_3km_36hr With the additional changes and updates to the weather model, and updates to the Hera environment file, all end-to-end tests (aside from nco tests) were run on Hera (intel). There were a few pre-existing failures, and aside from an occasional GST failure due to wallclock time issues (see #490) the only new failures were for grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GSD_SAR, grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_HRRR, and grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta, which all had a new failure in make_ics and make_lbcs. Currently investigating this issue, though it is almost certainly related to the build environments which need to be addressed in ufs-community/ufs-srweather-app#140 ## CONTRIBUTORS: @JeffBeck-NOAA authored the half of these changes originating from #480, and offered the following credits on his original PR: Thanks are due to @JamesAbeles-NOAA for his recommendations for build/namelist changes and help troubleshooting run times. Thanks to @BenjaminBlake-NOAA and @JacobCarley-NOAA for their help with the domain configuration.
JeffBeck-NOAA
approved these changes
May 28, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on Hera with a 3-km CONUS domain using the RRFS_v1alpha SDF. A 6-hr run completed without a problem. Approving!
mkavulich
added a commit
to mkavulich/ufs-srweather-app
that referenced
this pull request
Aug 26, 2022
…A_3km pre-defined domain, update timestep and MPI settings (ufs-community#492) ## DESCRIPTION OF CHANGES: This PR accomplishes three things: 1. A new pre-defined domain (RRFS_NA_3km) has been added to the SRW App. Nodes/core settings must be modified for chgres_cube and post due to the size of this domain. A WE2E test was added and more information on all of these settings can be found within the related config.sh script (tests/baseline_configs/config.grid_RRFS_NA_3km.sh). 2. The default k_split value is updated for a faster model integration. With k_split=2, we see model integration ~30% faster than the previous settings for the same weather model hash. **This will not affect physics suites that have specified other k_split values** 3. In order to properly run the above domain with the intended FV3_RRFS_v1alpha physics suite, the weather model needed to be updated to a more recent hash. This more up-to-date weather model version also has renamed the FV3_GFS_v16beta suite to FV3_GFS_v16; this required a number of changes to the workflow and end-to-end tests. In addition, several changes to default settings are occurring in this PR. Changes have also been made to k/n_split values in the namelist template which optimize run time. CPUS_PER_TASK_RUN_FCST is changed from "4" to "2" in this PR. Setting this field to "4" was doubling the requested nodes for the run_fcst task. For example, a 3-km CONUS run that normally requests 25 nodes (based on predefined layout_x/y values) was asking for 50, simply because CPUS_PER_TASK_RUN_FCST=4, which was unacceptable. When it is set to "2", the number of nodes remains unchanged, in line with the layout_x/y values. EMC is using CPUS_PER_TASK_RUN_FCST=2 for their runs, so this should be uncontroversial. This PR will need to be accompanied by changes in the ufs-srweather-app for updating the weather model hash and incorporating some necessary build changes (including compiling with 32-bit reals by default); this PR has been created (ufs-community#140) but it still a draft pending some platform-specific fixes and the merger of this PR. ## TESTS CONDUCTED: For the initial changes for PR ufs-community#480, multiple tests on Hera were run, including a full 36-hr forecast here: /scratch2/BMC/det/beck/FV3-LAM/expt_dirs/test_RRFS_NA_3km_36hr With the additional changes and updates to the weather model, and updates to the Hera environment file, all end-to-end tests (aside from nco tests) were run on Hera (intel). There were a few pre-existing failures, and aside from an occasional GST failure due to wallclock time issues (see ufs-community#490) the only new failures were for grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GSD_SAR, grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_HRRR, and grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta, which all had a new failure in make_ics and make_lbcs. Currently investigating this issue, though it is almost certainly related to the build environments which need to be addressed in ufs-community#140 ## CONTRIBUTORS: @JeffBeck-NOAA authored the half of these changes originating from ufs-community#480, and offered the following credits on his original PR: Thanks are due to @JamesAbeles-NOAA for his recommendations for build/namelist changes and help troubleshooting run times. Thanks to @BenjaminBlake-NOAA and @JacobCarley-NOAA for their help with the domain configuration.
mkavulich
added a commit
that referenced
this pull request
Sep 8, 2022
…A_3km pre-defined domain, update timestep and MPI settings (#492) ## DESCRIPTION OF CHANGES: This PR accomplishes three things: 1. A new pre-defined domain (RRFS_NA_3km) has been added to the SRW App. Nodes/core settings must be modified for chgres_cube and post due to the size of this domain. A WE2E test was added and more information on all of these settings can be found within the related config.sh script (tests/baseline_configs/config.grid_RRFS_NA_3km.sh). 2. The default k_split value is updated for a faster model integration. With k_split=2, we see model integration ~30% faster than the previous settings for the same weather model hash. **This will not affect physics suites that have specified other k_split values** 3. In order to properly run the above domain with the intended FV3_RRFS_v1alpha physics suite, the weather model needed to be updated to a more recent hash. This more up-to-date weather model version also has renamed the FV3_GFS_v16beta suite to FV3_GFS_v16; this required a number of changes to the workflow and end-to-end tests. In addition, several changes to default settings are occurring in this PR. Changes have also been made to k/n_split values in the namelist template which optimize run time. CPUS_PER_TASK_RUN_FCST is changed from "4" to "2" in this PR. Setting this field to "4" was doubling the requested nodes for the run_fcst task. For example, a 3-km CONUS run that normally requests 25 nodes (based on predefined layout_x/y values) was asking for 50, simply because CPUS_PER_TASK_RUN_FCST=4, which was unacceptable. When it is set to "2", the number of nodes remains unchanged, in line with the layout_x/y values. EMC is using CPUS_PER_TASK_RUN_FCST=2 for their runs, so this should be uncontroversial. This PR will need to be accompanied by changes in the ufs-srweather-app for updating the weather model hash and incorporating some necessary build changes (including compiling with 32-bit reals by default); this PR has been created (#140) but it still a draft pending some platform-specific fixes and the merger of this PR. ## TESTS CONDUCTED: For the initial changes for PR #480, multiple tests on Hera were run, including a full 36-hr forecast here: /scratch2/BMC/det/beck/FV3-LAM/expt_dirs/test_RRFS_NA_3km_36hr With the additional changes and updates to the weather model, and updates to the Hera environment file, all end-to-end tests (aside from nco tests) were run on Hera (intel). There were a few pre-existing failures, and aside from an occasional GST failure due to wallclock time issues (see #490) the only new failures were for grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GSD_SAR, grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_HRRR, and grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta, which all had a new failure in make_ics and make_lbcs. Currently investigating this issue, though it is almost certainly related to the build environments which need to be addressed in #140 ## CONTRIBUTORS: @JeffBeck-NOAA authored the half of these changes originating from #480, and offered the following credits on his original PR: Thanks are due to @JamesAbeles-NOAA for his recommendations for build/namelist changes and help troubleshooting run times. Thanks to @BenjaminBlake-NOAA and @JacobCarley-NOAA for their help with the domain configuration.
natalie-perlin
pushed a commit
to natalie-perlin/ufs-srweather-app
that referenced
this pull request
Jun 2, 2024
…with GNU compiler (segmentation fault) (ufs-community#140) * tests/rt_gnu.conf: turn off IPD tests, no longer working on Cheyenne with GNU compiler (segmentation fault) * update of jet modulefile and correction of cmake name from jet to jet.intel (to match gnumake)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
DESCRIPTION OF CHANGES:
This PR (along with https://github.com/NOAA-EMC/regional_workflow/pull/492) update the UFS SRW App to work with a more up-to-date hash of the ufs-weather-model (ufs-community/ufs-weather-model@2f1c8e1)
TESTS CONDUCTED:
Ran full suite of tests on Hera (aside from nco tests) with updated environment files (/scratch2/BMC/det/kavulich/workdir/update_app_master/step-by-step/expt_dirs/). The following failures were noted:
invalid reference to variable in NAMELIST input
)The new failures are only for older versions of NAM input; this is due to a change in the weather model, and may need to be handled in a separate PR.
Ran several end-to-end tests on Cheyenne (Intel 19.1.1) and Jet. Also ran the Graduate Student Test case on Orion. No failures outside of those outlined above.
Tests have not been run on WCOSS platforms; these will likely fail without being updated to the latest ESMF modules but I do not have access to update and test
ISSUE:
Fixes issue #134